1 Mount Hood Environmental, PO Box 1303, Challis, Idaho, 83226, USA
2 Mount Hood Environmental, 39085 Pioneer Boulevard #100 Mezzanine, Sandy, Oregon, 97055, USA
3 Mount Hood Environmental, PO Box 4282, McCall, Idaho, 83638, USA

Correspondence: Bryce N. Oldemeyer <>, Mark Roes <>

1 Background

Quantile random forest (QRF) models have become an increasingly popular tool for quantifying freshwater habitat carrying capacity due to their flexible framework that avoid common pitfalls associated with noisy data, correlated variables, and non-linear relationships. Recently, three QRF models were developed using large fish-habitat datasets (CHaMP dataset) within the Columbia River Basin to estimate carrying capacity for ESA-listed populations of Chinook salmon and steelhead during three critical life-stages (juvenile summer parr, juvenile winter presmolt, and adult redds). The covariates included in those models were selected from >100 potential covariates and chosen for their high predictive power to estimate capacity across the Columbia River Basin (cite IRA, See et al 2021, or another document outlining the original QRf selection process). However, a subset of the covariates included the QRF models are not useful for restoration project monitoring, are not informative for describing target conditions for restoration design due to the covariates inability to be manipulated by project actions, and are difficult to replicate or measure using streamlined fish habitat protocols (DASH - Carmichael et al. 2019). To increase the utility of the QRF model for project monitoring, project design, and future data collection efforts, we explored including alternative covariates in the QRF models that: 1) maintained high predictive power, 2) were informative for restoration efforts and monitoring, 3) could be calculated from DASH surveys, 4) were not missing an overabundance of data in the fish-habitat dataset, and 5) were not highly correlated with other covariates in the models (avoid overfitting the models). Additionally, we wanted to test the assumption made during the development of the original QRF model that a single model was appropriate for both Chinook salmon and steelhead during each of the three life stages.

Similarly, a random forest (RF) model has been used to predict habitat capacity estimates across larger spatial scales where CHaMP and/or DASH data aren’t available (cite IRA). We revisited the globally available attributes (GAAs) included in the original RF extrapolation model and made minor modifications to the extrapolation model that: 1) maintained covariates with high predictive power and 2) included covariates that better aligned with the revised QRF model covariates. To compare relative performance between the original and modified RF extrapolation models, we evaluated watershed carrying capacity estimates produced by the two models for eight watersheds located within the Upper Salmon River basin.

Through this process, we successfuly developed three modified QRF model that were more informative for restoration design and monitoring, included covariates that could be calculated using newly developed stream habitat protocols, and maintained a similar level of predictive power as the original QRF habitat capacity models. Below is a brief document outlining these efforts.

2 Covariate selection process

Potential habitat covariates for the QRF models were generated from the CHaMP dataset or obtained from other sources (e.g. NorWest stream temperature data). In total, 129 covariates were included in the selection process. Covariates were aggregated into eleven metric categories and 1-4 metrics were chosen from each category based on the rubric below.

  1. What was the strength between the covariate and the response variable (based on MIC score)?

  2. Could the covariate be calculated using DASH data?

  3. Was the covariate informative for restoration efforts?

  4. How much data were missing and/or the amount of “0”s for the covariate in the fish-habitat dataset?

  5. How correlated was the covariate with other covariates within the same metric category, particularly with covariates with higher MIC scores?

An oversimplified example of how a theoretical covariate might be selected for a model is described as follows.

In the original QRF model, discharge was likely selected as a covariate because it had a high MIC score and it made biological sense (i.e. discharge is a significant factor impacting fish habitat use and, presumably, habitat carrying capacity). Unfortunately, discharge isn’t that informative for restoration efforts because most restoration actions can’t create water. Discharge, like many habitat covariates, is highly correlated to other habitat covariates but these other covariates may have been left out of the original QRF model for any number of reasons (highly correlated with other covariates already in the model, excluded to avoid overfitting the model, etc.). Using the rubric, it is observed that average thalweg depth has a MIC score that is nearly as high as discharge, it is informative for restoration efforts, it can be calculated with DASH, and the two covariates are highly correlated (the high correlation is likely why average thalweg depth was left out of the original QRF model). Based on all the information above, mean thalweg depth would be substituted for discharge in the model. This process would be repeated for all the remaining covariates for that model.

Last, the covariate selection process was done independently for both species for all three life stages to test the assumption made during the original QRF model development that the top covariates for the three life stages were the same between species.

3 Covariate selection results

There were 12-14 covariates selected for each of the six models. While the relative importance of the final covariates in the three life stage models differed between species, the final covariates themselves were nearly identical. (Figure 3.1 , Figure 3.2, and Figure 3.3 ). Because of this, we consolidated the species-specific models into a single winter juvenile, summer juvenile, and redd models to be used for both species (Table 3.1).

Relative importance plots for covariates included in the juvenile summer QRF models

Figure 3.1: Relative importance plots for covariates included in the juvenile summer QRF models

Relative importance plots for covariates included in the juvenile winter QRF models

Figure 3.2: Relative importance plots for covariates included in the juvenile winter QRF models

Relative importance plots for covariates included in the QRF redds models

Figure 3.3: Relative importance plots for covariates included in the QRF redds models

Table 3.1: Habitat covariates and their descriptions used the three life stage QRF capacity models. Numbers indicate where each metric ranked in relative importance for each species. Dots indicate a metric was not used for a given model.
Name Metric Category Juv Sum Chnk Juv Sum Sthd Juv Win Chnk Juv Win Sthd Redds Chnk Redds Sthd Description
Channel Unit Frequency ChannelUnit 5 9 5 3 1 1 Number of channel units per 100 meters.
Fast NonTurbulent Frequency ChannelUnit 6 13 13 4 Number of Fast Water Non-Turbulent channel units per 100 meters.
Sinuosity Complexity 13 7 10 10 10 12 Ratio of the thalweg length to the straight line distance between the start and end points of the thalweg.
Wetted Channel Braidedness Complexity 14 14 13 13 Ratio of the total length of the wetted mainstem channel plus side channels and the length of the mainstem channel.
Fish Cover: Some Cover Cover 8 4 8 8 9 3 Percent of wetted area with some form of fish cover
Large Wood Density Cover 4 5 Large Wood per sq meter
Residual Depth Size 2 2 Average residual depth of the channel unit.
Average Thalweg Depth Size 1 3 2 2 Average Thalweg Depth, meters
Thalweg Exit Depth Avg Size 6 7 Depth of the thalweg at the downstream edge of the channel unit.
Gradient Size 3 2 7 1 4 6 Site water surface gradient is calculated as the difference between the top of site (upstream) and bottom of site (downstream) water surface elevations divided by thalweg length.
Residual Pool Depth Size 12 10 11 5 The average difference between the maximum depth and downstream end depth of all Slow Water/Pool channel units.
Discharge Size 3 4 The sum of station discharge across all stations. Station discharge is calculated as depth x velocity x station increment for all stations except first and last. Station discharge for first and last station is 0.5 x station width x depth x velocity.
Substrate Est: Boulders Substrate 10 12 8 11 Percent of boulders (256-4000 mm) within the wetted site area.
Substrate Est: Cobble and Boulder Substrate 11 11 Total cobble plus boulder percentage
Substrate Est: Cobbles Substrate 11 6 5 8 Percent of cobbles (64-256 mm) within the wetted site area.
Substrate Est: Coarse and Fine Gravel Substrate 7 8 12 12 7 13 Percent of coarse and fine gravel (2-64 mm) within the wetted site area.
Substrate Est: Sand and Fines Substrate 9 5 9 9 6 7 Percent of sand and fine sediment (0.01-2 mm) within the wetted site area.
Avg. August Temperature Temperature 2 1 3 10 Average predicted daily August temperature from NorWest, averaged across the years 2002-2011.
Elevation Temperature 1 6 Elevation, meters
Large Wood Frequency: Wetted Wood 4 11 12 9 Number of large wood pieces per 100 meters within the wetted channel.

4 Extrapolation model

The spatial extent of QRF capacity predictions was/is limited to reaches with high-resolution habitat data (i.e. CHaMP or DASH data), so an extrapolation model was developed to estimate habitat capacity for the Columbia River Basin using “globally available attributes” (GAAs) obtained from a continuous, linear, stream network created by Morgan Bond and Tyler Nodine based on the National Hydrography Dataset High Resolution 1:24,000. Using the GAAs from the linear stream network and a random forest model structure, capacity estimates at the 200 meter reach scale for the entire Columbia River Basin. Consistent with the QRF model, the extrapolation model makes no assumptions about the direction and distribution of effects of predictors, and constrains density estimates within the range of predictions produced by the QRF model. However, random forest methods do not account for variable strata weights across the CHaMP dataset, a source of potential bias that could be alleviated through the collection of additional paired fish and habitat data.

Extrapolation model covariates were selected from the list of GAAs and examined for inclusion by examining relative importance and partial dependence plots and correlation between covariates. We used the covariates included in the previous extrapolation as a starting point for selection. This resulted in the replacement of regime (an indicator of dominant precipitation type) for elevation and the removal of relative slope, which we found was redundant with gradient. Model results indicated that elevation was consistently one of the most important predictors in the model. This is particularly true for the Chinook parr summer model where capacity predictions were primarily driven by elevation.

Table 4.1: Globally available attritibutes (GAAs) and their descriptions used in the random forest extrapolation model.
Metric Decription
Gradient % Stream gradient (%).
Sinuosity Reach sinuosity. 1 = straight, 1 < sinuous.
Alpine accumulation Number of upstream cells in alpine terrain.
Fines accumulation Number of upstream cells in fine grain lithologies.
Flow accumulation Number of upstream DEM cells flowing into reach.
Gravel accumulation Number of upstream cells in gravel producing lithologies.
Precipitation accumulation Number of upstream cells weighted by average annual precipitation.
Floodplain width Current unmodified floodplain width.
Avg Aug stream temperature Historical composite scenario representing 10 year average August mean stream temperatures for 2002-2011 (Isaak et al. 2017).
Disturbance PCA 1 Disturbance Classification PCA 1 Score (Whittier et al. 2011).
Natural PCA 1 Natural Classification PCA 1 Score (Whittier et al. 2011).
Natural PCA 2 Natural Classification PCA 2 Score (Whittier et al. 2011).
Elevation Elevation at downstream end of reach
Extrapolations of habitat capacity for Chinook salmon, by life-stage, for the eight watersheds within the Upper Salmon River Basin using the modified models.

Figure 4.1: Extrapolations of habitat capacity for Chinook salmon, by life-stage, for the eight watersheds within the Upper Salmon River Basin using the modified models.

Extrapolations of habitat capacity for steelhead, by life-stage, for the eight watersheds within the Upper Salmon River Basin using the modified models.

Figure 4.2: Extrapolations of habitat capacity for steelhead, by life-stage, for the eight watersheds within the Upper Salmon River Basin using the modified models.

5 Habitat Capacity Estimates

5.0.1 Chinook Salmon

Table 5.1: Predicted Chinook salmon habitat capacity per kilometer by life-stage and watershed using the modified models.
Watershed Juv summer capacity/km Summer SE/km Juv winter capacity/km Winter SE/km Redd capacity/km Redd SE/km
EF Salmon 12,335 1,452.9 885 210.5 3 0.1
Lemhi 5,766 459.4 1,038 112.6 3 0.1
NF Salmon 6,504 961.3 1,351 199.5 3 0.1
Pahsimeroi 5,146 357.3 1,689 189.8 3 0.1
Panther Cr 8,544 829.3 1,410 156.2 3 0.1
Upper Salmon 17,082 1,823.5 862 235.9 3 0.1
Valley Cr 15,833 1,726.0 961 270.8 3 0.2
Yankee Fork 14,967 1,916.6 833 200.9 3 0.2

5.0.2 Steelhead

Table 5.2: Predicted steelhead habitat capacity by life-stage and watershed using the modified models.
Watershed Juv summer capacity Summer SE Juv winter capacity Winter SE Redd capacity Redd SE
EF Salmon 252,597 15,520.5 337,682 36,795 413 24
Lemhi 310,577 9,082.3 363,898 27,441 441 18
NF Salmon 242,471 18,381.8 313,118 27,955 323 22
Pahsimeroi 159,705 6,225.1 205,921 13,951 198 8
Panther Cr 268,476 13,598.0 339,671 19,946 317 15
Upper Salmon 243,548 14,843.6 310,879 39,013 452 32
Valley Cr 176,048 10,707.6 288,579 31,329 365 26
Yankee Fork 197,926 12,378.9 341,310 38,555 449 36
Table 5.2: Predicted steelhead habitat capacity per kilometer by life-stage and watershed using the modified models.
Watershed Juv summer capacity/km Summer SE/km Juv winter capacity/km Winter SE/km Redd capacity/km Redd SE/km
EF Salmon 1,525 93.7 2,039 222.2 2 0.1
Lemhi 1,774 51.9 2,079 156.8 3 0.1
NF Salmon 2,049 155.3 2,646 236.2 3 0.2
Pahsimeroi 1,924 75.0 2,481 168.1 2 0.1
Panther Cr 2,105 106.6 2,664 156.4 2 0.1
Upper Salmon 1,485 90.5 1,895 237.8 3 0.2
Valley Cr 1,465 89.1 2,401 260.7 3 0.2
Yankee Fork 1,249 78.1 2,154 243.4 3 0.2

5.1 Comparison with previous extrapolation

Comparisons of watershed capacity estimates from the previous QRF and extrapolation model and the new revised versions reveal modest differences in most cases, with an exception of Chinook parr summer capacities in several watersheds. The substantial increases in Chinook parr summer capacity are likely due to the inclusion of elevation in the extrapolation model and range from 21 - 222% compared to the previous extrapolation.

5.1.1 Chinook

Comparison of Chinook salmon habitat capacity estimates between revised and original model extrapolation, by life-stage, for the eight watersheds within the Upper Salmon River Basin.

Figure 5.1: Comparison of Chinook salmon habitat capacity estimates between revised and original model extrapolation, by life-stage, for the eight watersheds within the Upper Salmon River Basin.

Table 5.3: Estimated chinook capacities and comparison with previous random forest extrapolations for eight watersheds
Model Watershed Capacity per km Total capacity Capacity % change Capacity SE
Juv summer EF Salmon 12,335.5 1,926,623 112 226,926
Juv summer Lemhi 5,765.9 786,452 112 62,660
Juv summer NF Salmon 6,503.6 339,275 13 50,148
Juv summer Pahsimeroi 5,145.6 265,099 45 18,409
Juv summer Panther Cr 8,543.7 1,219,542 21 118,369
Juv summer Upper Salmon 17,081.6 3,301,286 163 352,419
Juv summer Valley Cr 15,832.8 1,902,198 152 207,363
Juv summer Yankee Fork 14,967.3 2,144,056 222 274,556
Juv winter EF Salmon 884.9 138,214 0 32,880
Juv winter Lemhi 1,037.5 141,515 -8 15,359
Juv winter NF Salmon 1,350.7 70,462 28 10,409
Juv winter Pahsimeroi 1,688.7 86,999 -8 9,781
Juv winter Panther Cr 1,410.0 201,265 29 22,296
Juv winter Upper Salmon 861.6 166,522 -29 45,582
Juv winter Valley Cr 961.5 115,517 -12 32,535
Juv winter Yankee Fork 832.8 119,298 20 28,783
Redds EF Salmon 2.6 402 -13 21
Redds Lemhi 2.6 353 5 11
Redds NF Salmon 3.2 166 -5 8
Redds Pahsimeroi 2.7 139 25 4
Redds Panther Cr 3.1 448 -4 17
Redds Upper Salmon 3.0 575 -20 29
Redds Valley Cr 3.3 394 -29 20
Redds Yankee Fork 3.1 438 -38 23

5.1.2 Steelhead

Comparison of steelhead habitat capacity estimates between modified  and original models extrapolation, by life-stage, for the eight watersheds within the Upper Salmon River Basin.

Figure 5.2: Comparison of steelhead habitat capacity estimates between modified and original models extrapolation, by life-stage, for the eight watersheds within the Upper Salmon River Basin.

Table 5.4: Estimated steelhead capacities and comparison with previous random forest extrapolations for eight watersheds
Model Watershed Capacity per km Total capacity Capacity % change Capacity SE
Juv summer EF Salmon 1,525.4 252,597 -31 15,521
Juv summer Lemhi 1,774.2 310,577 -15 9,082
Juv summer NF Salmon 2,048.7 242,471 -5 18,382
Juv summer Pahsimeroi 1,924.2 159,705 -18 6,225
Juv summer Panther Cr 2,105.3 268,476 -8 13,598
Juv summer Upper Salmon 1,484.6 243,548 -31 14,844
Juv summer Valley Cr 1,465.0 176,048 -28 10,708
Juv summer Yankee Fork 1,249.4 197,926 -29 12,379
Juv winter EF Salmon 2,039.2 337,682 -14 36,795
Juv winter Lemhi 2,078.7 363,898 -8 27,441
Juv winter NF Salmon 2,645.6 313,118 -1 27,955
Juv winter Pahsimeroi 2,481.0 205,921 -4 13,951
Juv winter Panther Cr 2,663.6 339,671 8 19,946
Juv winter Upper Salmon 1,895.1 310,879 -26 39,013
Juv winter Valley Cr 2,401.4 288,579 -14 31,329
Juv winter Yankee Fork 2,154.4 341,310 -18 38,555
Redds EF Salmon 2.5 413 -13 24
Redds Lemhi 2.5 441 10 18
Redds NF Salmon 2.7 323 -10 22
Redds Pahsimeroi 2.4 198 2 8
Redds Panther Cr 2.5 317 -7 15
Redds Upper Salmon 2.8 452 -11 32
Redds Valley Cr 3.0 365 -20 26
Redds Yankee Fork 2.8 449 -25 36

6 Supplemental figures and tables

6.1 QRF partial dependence plots

Partial dependence plots for covariates included in the juvenile summer QRF models

Figure 6.1: Partial dependence plots for covariates included in the juvenile summer QRF models

Partial dependence plots for covariates included in the juvenile summer QRF models

Figure 6.2: Partial dependence plots for covariates included in the juvenile summer QRF models

Partial dependence plots for covariates included in the juvenile winter QRF models

Figure 6.3: Partial dependence plots for covariates included in the juvenile winter QRF models

Partial dependence plots for covariates included in the juvenile winter QRF models

Figure 6.4: Partial dependence plots for covariates included in the juvenile winter QRF models

Partial dependence plots for covariates included in the QRF redds models

Figure 6.5: Partial dependence plots for covariates included in the QRF redds models

Partial dependence plots for covariates included in the QRF redds models

Figure 6.6: Partial dependence plots for covariates included in the QRF redds models

6.2 Capacity by stream

6.2.1 Chinook

6.2.2 Steelhead